September 18, 2012
Bioinformatics: Pinto Ab NGS
Used the same background and gene list files as before and uploaded to DAVID. For the contigs with 0 matches across species, downloaded the Gene onotology enrichment files BP1, BP2, BP3, BP4, and BP5. Used BP2 to make a pie chart using number of contigs as the size of the pie wedges. This is the new orphan gene pie chart for the manuscript.
Annotated files of SNPs discovered in each species (H. kamtschatkana and O. lurida) with SPID descriptions, GO and GO Slim. The files for
pinto and
oly can be found at the links on dropbox.
Uploaded files with orphan genes for each species annotated with SPIDs and gene descriptions to Galaxy. Annotated with GO and GO Slim terms in Galaxy. Created list of non-redundant orphan contig-GO Slim terms and from that list made a pivot table and pie chart of GO Slims that are represented within each group of orphan genes.
September 12, 2012
Secondary Stress: Transcriptomics
Mailed sample for NGS to Eli Meyer. Sample numbers are listed below (n=4 each for ambient and elevated pCO2) with BC-MPX barcode numbers in parentheses. For each sample, thawed, flicked to mix, and aliquoted 5 µl into an eppie tube so that all samples were pooled in 1 tube. Mailed the tube on dry ice.
Samples:
Exp2.1 (1-1)
Exp2.4 (51-1)
Exp2.7 (52-1)
Exp2.10 (53-1)
Exp2.217 (57-2)
Exp2.220 (58-2)
Exp2.235 (54-3)
Exp2.238 (55-3)
August 30, 2012
Secondary Stress: Proteomics
Redid PCA using a variance-covariance matrix. Also did PCA for the proteins just involved in stress response (averaged spectral counts within treatments) and for individual oysters (n=16) for all proteins.
For the PCA of all proteins and spectral counts averaged within treatments, the following eigenvector loadings were the highest:
PC1: AY256853.p.cg.8_4 (superoxide dismutase) -0.138
ES789884.p.cg.8_8 (alpha tubulin) 0.249
CU686207.p.cg.8_8 (myosin) -0.117
CU991685.p.cg.8_6 (unknown) -0.410
PC2: BQ427067.p.cg.8_5 (actin) 0.107
BQ426898.p.cg.9_7 (unknown) 0.118
CU984218.p.cg.8_6 (peroxiredoxin 6) -0.151
BQ426757.p.cg.8_14 (myosin) 0.144
PC3: ES789884.p.cg.8_8 (alpha tubulin) -0.101
BQ426898.p.cg.9_7 (unknown) -0.175
FU6OSJA01BC2WI.p.cg.8_3 (unknown) -0.152
CU989410.p.cg.8_6 (unknown) -0.115
FU6OSJA02I1J4K.p.cg.8_7 (60s ribosomal protein) -0.109
Enrichment analysis of protein sets that are 2- and 4-fold differentially expressed between treatments. Calculated differential expression between treatments using the spectral counts averaged within treatment groups. Made protein sets for the following comparisons (separate lists for >4-fold and >2-fold expression differences): ambient pCO2 vs. high; high pCO2 vs. high mechanical stress; ambient pCO2 vs. ambient mechanical stress. For each comparison, the spectral counts for the second stressor listed were divided by the first (i.e. the question is which proteins are up-regulated in high pCO2 and under mechanical stress within both pCO2 treatments). Uploaded this list to Galaxy and joined with blastp results to get SPIDs associated with the contig numbers. Used the SPIDs as input into DAVID's functional analysis tool and exported the GO FAT table to put into REVIGO.
August 29, 2012
Secondary Stress: Proteomics
From each oyster took list of expressed proteins and combined into one list (keeping only unique values) to create an expressed proteome for the experiment. Joined this file with the blastp results, GO and GO Slim.
For each oyster, joined the 3 technical replicates into one file based on the expressed proteome for that particular treatment group (i.e. the 4 sub-proteomes for each treatment). Averaged the spectral counts across technical replicates.
Joined together all averaged spectral counts to the expressed proteome.
Did enrichment analysis in DAVID on each treatment, on OA (combined 2800 and 2800 + mechanical stress), and mechanical stress (m.s. from 400 and 2800 µatm).
Did PCA on the averaged spectral counts across technical and biological replicates with treatment group as the observation and protein as the variable. (Deleted all proteins with 0 expression across treatments.)
August 27, 2012
Secondary Stress: Proteomics
Downloaded all raw data files from the orbitrap and from the QExactiv to an external hard drive. These files will be necessary for doing comparative expression analysis in Skyline.
Brought back remaining samples in autsampler vials (from both MS machines). Put these in the proteomics box in the -80°C.
Steven ran a blastp of the translated Sigenae database against Swissprot. See his notebook
here and the file
here. I downloaded this file and edited it so that the only 4 columns remaining are the contig number, the swissprot ID corresponding to the top blast hit, the e-value, and the bit score. This file is saved as Sigenae blastp (tab delimited).
Downloaded all protein prophet files with probability cut-off of 0.9.
Secondary Stress: Histology
Analyzed slides sent out 8/20 (see 8/17 for analysis). Some of the sections, especially for the high pCO2 treatment, are not very good and don't include the digestive gland. Sample sizes for DG for each treatment are n=6 for 400 µatm, n=6 for 400 µatm + mechanical, n=2 for 1400 µatm, n=3 for 1400 µatm + mechanical. Also, in some of the sections the tissue is either not preserved well or the slides weren't made well and there are tears so it's hard to score the tubules as metaplasia or not.
August 23, 2012
Secondary Stress: Proteomics
All samples have now gone through the first injection (2nd injection has started) and there was strong peptide signal for all. The 3rd standard has run and looks good: the 2 standard peaks are still separate and the signal is clean after the high organic wash, but there is still some carryover signal before the wash.
Overview of software
Skyline for doing relative protein expression between samples (more robust than spectral counting because takes area under the spectral curve for comparison). Only runs on PCs. Use
MS1 Full-Scan Filtering (
pdf of manual). Notes: 1) use mzxml and peptide.xml files for library, raw files are imported as the data to generate the protein expression profiles 2) in peptide settings structural modifications only include carbamidomethyl cysteine and methionine oxidation.
In Peptide profit, the number after an amino acid indicates that it has been modified. These modifications (circled in pink) frequently occur during sample preparation.
In Peptide Prophet file can view the sensitivity/probability curve (just click on probability in the row of the peptide you are interested in). This is good for supporting why a certain probability cutoff was chosen. Gives false discovery rate for each probability. Sensitivity = fraction of all correct assignments passing filter; error = fraction of peptide assignments passing filter that are incorrect.
In Peptide Profit file can also generate 3D image of the peptides. Green-red corresponds to peptide probability. Blue dots are peptides that were not identified in the database search. The x-axis is the time of the chromatography. Contamination would be seen as dark horizontal lines across the plot.
Sensitivity/error can also be found in the protein profit file.
Notes on standardization of MS data (from discussions with Jimmy and Priska):
-can normalize spectral counts to length of protein since longer proteins will have greater hits
-can create histogram of number of spectral counts on the x-axis and proteins on the y-axis and just analyze the middle of that distribution
-can normalize to housekeeping proteins - i like this one best. you can choose a few housekeeping proteins (actin, tubulin, etc.) and use the spectral counts for those proteins to create a scalar to correct spectral counts for other proteins that are potentially differentially expressed.
August 22, 2012
Secondary Stress: Proteomics
All samples that ran last night (400 µatm and 400 µatm + mechanical stress) collected lots of peptides and look good. A standard ran this morning before beginning on MS of 1400 µatm samples. The standard looks a little different from yesterday's: the 2 peaks of angiotensin and neurotensin are still separated, but there is more noise after the peaks. The organic wash is still visible and MS equilibrates back to 0 after the wash. The noise after the standard peaks is carryover from the previous sample(s). Since we are looking at larger differences in protein expression, this should not significantly affect our results. The bias of the carryover will also partially be corrected for by randomizing the order in which the samples are injected among the 3 technical replicates.
Subsampled 25 µl of each sample from the autosampler vials and put into new autosampler vials. Priska is running these on the Q Exactive so we can do a comparative analysis of the protein identifications between the 2 machines (this also means I get twice as much data for the same price :) ).
Made translated Sigenae database in Galaxy. File from yesterday finally finished uploading (82,312 sequences). For the translation in EMBOSS's getorf package, used standard code, minimum nucleotide size of ORF to report = 30, maximum nucleotide size of ORF to report = 1 million, translation of regions between start and stop codons, all start codons code for methionine, no circular sequence, find ORFs in reverse complement, number of flanking nucleotides to output = 100, output = FASTA. The output (Translation of Sigenae v 8) is 1,060,291 sequences.
Jimmy ran SEQUEST and Peptide and Protein Prophet on my first 6 MS runs. Data are downloaded by logging into the UWPR projects website and following link at the bottom of the project page. I selected to download proteins that meet the minimum probability of 0.3. I am only working with the Protein Prophet files at this point.
Order of samples for 2nd injection:32, 26, 8, 2, 248, 242, 227, 221, 35, 29, 11, 5, 251, 245, 230, 224
Order of samples for 3rd injection: 251, 2, 248, 5, 245, 8, 242, 11, 230, 26, 227, 29, 224, 32, 221, 25
1 standard is run after every 8 samples
5 most highly expressed proteins for each of the samples run so far: sorted the protein prophet files by number of independent spectra (largest to smallest). Used blastp to find the protein the corresponds to the translation of the indicated contig. Below the 5 most highly expressed proteins are listed beneath the sample number. Contig number is given first with the number of independent spectra in parentheses, followed by the protein name and its accession number.
101B_2
AY256853.p.cg.8 (68) - extracellular superoxide dismutase (Q08420.2, 1E-11) and AY551094 (68) - same hit (6E-12)
AF026063 (54) - actin in C. gigas (O17320.1, 0)
AM857656 (43) - myosin (Q9JLT0.1, 0)
CU683354 (31) - actin (P18091.2, 0)
ES789884 (29) - tubulin (P68370.1, 0)
101B_5
AY256853 (63) and AY551094 (63) - SOD (see above)
ES789884 (48) - tubulin (see above)
AF026063 (47) - actin (see above)
AB118650 (43) - arginine kinase (O15990.1, 3E-168)
AM857656 (35) - myosin (see above)
CU683354 (29) - actin (see above)
AB196534 (28) - beta tubulin (P11833.1, 0)
AF144646 (26) - HSP70 (Q9U639.1, 0)
EW778673 (26) - ATP synthase, subunit B (Q25117.1, 0)
EW779263 (25) - protein disulfide-isomerase (P04785.2, 0)
103B_221
AY25685 (50) - SOD (see above)
ES789884 (40) - tubulin (see above)
AF026063 (39) - actin (see above)
AM857656 (34) - myosin
BQ426757 (29) - myosin (P24733.1, 0)
CB617458 (28) - phospholenolpyruvate carboxykinase (P29290.1, 0)
CU683354 (28) - actin
EW778673 (27) - ATP synthase
AB118650 (24) - arginine kinase
AJ544886 (23) - phospholenolpyruvate carboxykinase (P29290.1, 0)
CU682628 (22) - plasminogen (P06868.2, 1E-39)
AB122067 (22) - beta tubulin (P11833.1, 0)
August 21, 2012
Secondary Stress: Proteomics
Made 2% acetonitrile (ACN) in water with 0.1% FA and resuspended all the samples in 100 µl (vortexed to mix). Spun samples down at 15,000 rpm for 10 minutes. Removed supernatant (NB: no precipitate was visible, but left a small amount of liquid in the bottom of the tubes just in case) and transferred to autoinjection vials, being careful not to get any bubbles in the very narrow part in the bottom. Loaded the vials in tray 2, rows B and C (see below for layout). For the first injection, only ran 1 sample to make sure that everything went ok (run began around 11:30 am). The column used (packed yesterday) is 35 cm long (better for complex mixtures). The column is lined up about 2-3 mm from the entry into the MS.
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
B
|
221
|
224
|
227
|
230
|
242
|
245
|
248
|
251
|
C
|
2
|
5
|
8
|
11
|
26
|
29
|
32
|
35
|
In the Nanoacquity software, the biosolvent manager is used to adjust flow. Adjust until you get 0.3 µg/mL and <2000 psi of pressure. Always leave on solvents A1 and B1.
Standard used is made of angiotensin and neurotensin. They are very close together and if there is something wrong with the column their peaks will be indistinguishable.
After the high percentage organic wash on the column (near the end of the injection), make sure that the MS equilibrates back to 0.
Spray voltage should be between 1.8-2 kV.
Data display should be in centroid mode so that it saves only the location and intensity of the peaks.
Number of microscans adds scans together and saves as one scan.
Scan 1 happens in orbi trap (400-2000 m/z). Orbi trap determines how large ions are depending on their speed. The other scans are dependent on this first one and happen in the ion trap.
Set repeat count at 1 in dynamic exlusion - do not need to sequence the same precursor multiple times.
Make sure that the pressure does not go up too much after the first run. The pressure started at 1911 psi for the first run today.
I stayed through injection of the 3rd sample. Lots of peptides were collected for the first 2 samples (see plot below) and the column pressure isn't going too high (above 2000 psi).
Made a database to search proteomics data against. Made a new workflow in Galaxy called "C. gigas Proteomics" and uploaded Sigenae_v8_contigs.fasta. Ooops, galaxy is down. Will pursue this tomorrow.
August 20, 2012
Secondary Stress: Proteomics
Went to UWPR (UW Proteomics Resource) this morning to pack my columns with Priska. The columns are made from a very thin coated silica tubing and is packed with tiny beads (Magic C18). The tubing is cut and melted by a laser while being stretched to create a break with a tapered point at the end. A slurry of the beads in EtOH is placed in a chamber that is closed except for a tiny opening where the silica tubing can be fed through. The non-tapered end is fed into the slurry of the beads and then helium gas (1000 psi) is used to bring up the pressure in the chamber and drive the beads up the column. The column is left like this for a while to make sure that it fills completely the the beads pack well.
Secondary Stress: Histology
Sent out 16 more samples for histology. Sent out 4 samples each from 4 different treatments from the sampling done on 2.11.12. For 400 µatm sent out H2.96, 93, 74 and 80; for 400 µatm + mechanical stress sent out 87, 88, 85, and 84; for 1400 µatm sent out 21, 8, 3, 1; and for 1400 µatm + mechanical stress sent out 9, 15, 14, and 13. Samples were wrapped in 70% EtOH soaked paper towels. EtOH paper towels were placed in the bottom of a plastic histo jar, the samples were placed on top, and then more EtOH soaked towels were used to pack it in. The sealed jar was placed in a ziploc filled with desiccant and then double bagged in another ziploc. The bagged samples were put in a cooler in a box, sealed, affixed with the appropriate warning indication, and mailed priority overnight.
August 17, 2012
Secondary stress: Proteomics
Day 4 of prep for MS/MS: desalting. solvent A = 80% aceotnitrile (ACN) + 0.1% trifluoroacetic acid (TFA); solvent B = 5% ACN + 0.1% TFA. Before Brook froze the samples yesterday she added 100 µl of solvent B. I added 100 µl more this morning before desalting.
Macrospin columns were prepped to make C18 structure open up. Added 200 µl of solvent A to columns and spun down at 2000 rpm for 3 minutes. Repeated 3 more times. Added 200 µl solvent B to the column and spun down at 2000 rpm for 3 minutes. Repeated 2 more times. Added entire volume (200 µl) of protein digest and spun down at 3000 rpm for 3 minutes. Collected flow-through from tube and spun down again. Washed column with loading volume of solvent B at 3000 rpm for 3 mintes and repeated two more times. (Saved flow-through from previous 2 steps.) Put tubes in a clean collection tube. In two separate additions, added 100 µl of solvent A and spun down at 3000 µl for 3 minutes. Discarded columns. Put collected sample in speed vac for >45 minutes minutes to dry, but do not dry completely because pellet could jump out of the tube. Stored at -80°C.
Secondary stress: histology
Took pictures of histo slides (5/24/12). 3 pictures each (when possible) of digestive gland tubules and gill at 10x. Analyzed DG tubules looking for metaplasia.
August 16, 2012
Secondary stress: Proteomics
Brook put my samples in the speed vac this morning to evaporate the liquid. Since my samples were low in volume (see yesterday) she added some dilute formic acid. The samples will be centrifuged under vacuum until most of the liquid is gone. They will be stored at -80°C overnight.
August 15, 2012
Secondary Stress: Proteomics
Day 2 of sample prep for MS/MS. Took samples from yesterday and added 6.6 µl of 1.5 mM Tris pH 8.8. Then added 2.5 µl of 200 mM TCEP (1 M made yesterday and diluted today). Vortexed samples. Tested pH of a couple samples and was ~8-8.5 (target pH). Incubated samples on shaker at 37°C for 1 hour. Added 20 µl of 200 mM iodoacetamide (IAM; 1 M made yesterday and diluted today). This alkylates the proteins. Vortexed and allowed to sit for 1 hour at room temp in a dark drawer. Added 20 µl of 200 mM dithiolthreitol (DTT; made yesterday). Vortexted and let sit for 1 hour at room temp. This absorbs excess IAM. Since protein concentrations are so high (~3000 µg/ml) continued protocol with ~1/4 of the solution (50 µl equal to about 100 µg/ml). Added 200 µl of 25 mM ammonium bicarbonate (dilutes urea). Added 50 µl of HPLC grade MeOH. Vortexed. Added trypsin buffer (20 µl) to a bottle of tripsin and lightly vortexed. Aliquoted 3 µl of trypsin to each tube (this is equal to 3 µg of trypsin, aiming for a 50:1 protein:trypsin). Vortexed. Incubated overnight at 37°C.
August 14, 2012
Secondary Stress: Proteomics
Began protein sample prep for MS/MS next week. Chose 16 samples to analyze: 4 control (103B) = Exp2.221, 224, 227, and 230; 4 control + mechanical stress = 242, 245, 248, 251; 4 highest pCO2 (101B): 2, 5, 8, 11; 4 highest pCO2 + mechnical: 26, 29, 32, 35. Homogenized gill samples in 100 µl of 50 mM NH4HCO3 with a RNAse free pestle. Sonicated each sample 4 times, keeping on dry ice in between sonications. Sonicator probe was cleaned with methanol between samples and with methanol and water between treatment groups. Measured protein concentration of homogenized samples with a Bradford assay and following the Pierce protocol. Made standards A-I as detailed in the protocol. Diluted 5 µl of either standard or sample in 250 µl Coomassie Reagent and mixed by inverting 4 times. Measured each sample in triplicate on the nanodrop, inverting 4 times between each measurement. From sample 26-251 pipetted samples up and down to mix before first aliquot (mixed by inversion before subsequent aliquots) because aggregations were beginning to form in samples. Quant data
here. Below is the standard curve based on the average absorbance values for each known standard concentration. The averages are corrected for background by subtracting the average absorbance of the blank (I). From this curve, concentrations for the experimental samples were calculated using the equation x (concentration) = (absorbance-0.019)/5E-5.
Sample
|
Concentration (µg/ml)
|
2
|
3020.0
|
5
|
2740.0
|
8
|
2793.3
|
11
|
2886.7
|
26
|
2940.0
|
29
|
2553.3
|
32
|
2373.3
|
35
|
2693.3
|
221
|
2740.0
|
224
|
2720.0
|
227
|
2760.0
|
230
|
2886.7
|
242
|
2583.3
|
245
|
2893.3
|
248
|
2833.3
|
251
|
2840.0
|
Added 36 mg (~0.036 g) or urea to each 100 µl sample to bring the concentration to 6 M urea. Stored at -80°C.
August 10, 2012
Bioinformatics: Pinto Ab NGS
Submission to NCBI's SRA
Started a submission called "PNW Larval RNA-Seq". For each species, created a BioProject, a BioSample, an Experiment and a Run. For each library of data (2 for each), created a Data Block within the Run. In the Run, information is needed on flow cell and lane for Illumina data. This can be found at the beginning of the raw fastq file.
Checksums were generated using the program MD5.
August 9, 2012
Bioinformatics: Pinto Ab NGS
For the previous enrichment analyses I did, I neglected to use the e-value cut-off. I remade the files and re-did the DAVID analysis for the 0 matches for both Oly and pinto. SPID lists for backgrounds and matches are non-redundant.
Deleted previous files from Galaxy workflow and annotated enrichment files from DAVID to GO Slim. Made table for GO Slims across all matching categories (0-6).
For oly, the contigs with 6 matches (i.e. across all species) were not enriched for any GO terms.
Made stacked bar plots with number of cross-species matches on the x-axis and proportion of each set corresponding to GO Slim terms on the y. The colors correspond to the following Go Slims:
cell adhesion = light brown
cell cycle & proliferation = light blue
cell organization and biogenesis = violet
cell-cell signaling = pale green
death = light coral
developmental processes = tan
protein metabolism = teal
RNA metabolism = dark yellow
signal transduction = light lavender
stress response = dark red
transport = pink
Also made stacked bar plots of GO Slim terms corresponding to number of matches not based on enrichment. Filtered contig number and its associated GO Slim by SPID and discarded all contigs that didn't meet the cut-off (1e-5). Kept only unique contig-GO Slim associations (Hkam GO Slim, Oly GO Slim). Did the same for contigs and number of cross-species matches (Hkam matches, Oly matches). Uploaded both of these files to Galaxy and joined based on contig number.
August 7, 2012
Olympia oyster OA
SEM on Carolyn's Oly larvae from her May experiment. Larvae had been stored in EtOH so I sucked up a little bit of larvae (there were large numbers of them at the bottom of each tube) and streaked them across the SEM stub. The EtOH was allowed to evaporate off before loading sample in SEM. Took pictures of the samples 400-D2L and 2200-8 (pictures are saved in the SEM folder "Emma T-S" in "Oly Larvae"). For each larva took a picture of the entire shell at 500x and took a picture of the growth lines on the shell at 3000x. For 400-D2L the following photo numbers are of the same larvae: (1,2,9), (3,4,8), (5-7), (11&12), (13&14), (15&16), (17-19), (20-23), (24&25), (26&27), (28-30), (31&32), (33&34), (35&36), (37-39), (40&41), (42&43), (44&45), (46-48). For 2200-8, the pictures were all taken consecutively and pictures of a new larva begin with a picture of the entire shell followed by zoomed in pictures. The layout of the samples on the stubs is below:
400-D2L
|
400-D1L
|
400-8
|
400-7
|
400-6
|
1000-8
|
1000-7
|
1000-6
|
1600-8
|
1600-7
|
1600-6
|
2200-8
|
August 3, 2012
Oyster Transport: Willapa Bay
Went down to Willapa Bay to pick up 400 oysters to use as broodstock in the cross-generation OA study. The oysters had been collected (multiple year classes) by Alan Trimble from 2 different locations in Willapa (Baby Island and another one). We collected 200 from each group and kept them separate throughout cleaning and transport. We picked the oysters at low tide (~9 am) and then brought them to the WDFW office to clean. For each oyster, we scraped off epibionts, with special attention paid to oyster drills and their eggs. Oysters were then soaked in fresh water for 10 minutes to make sure they were closed and transferred to a dilute bleach solution for 1 hour (20.8 mL of bleach in 5 gallons of fresh water). We tried to keep them in the shade throughout. After the bleach soak, we rinsed the oysters with fresh water, they were checked by a WDFW employee, and we packed them in coolers covered with a damp towel and ice packs. We then drove them up to Bainbridge Island and handed them off to Joth.
August 2, 2012
Bioinformatics: Pinto Ab NGS
Re-did Cross-species ortholog plots with corrected contig lists.
Started submission process on NCBI SRA for both Oly and Pinto (I think...the user interface on NCBI is seriously lacking). Have not uploaded any data yet.
Analyzed contig blastx results to identify gene categories to include in the discussion. Removed blast hits that had an e-value greater than 1e-5. For pinto, start by looking for contigs associated with reproduction. Scanned the GO terms associated with contigs to find promising ones: in utero embryonic development, meiosis, spermatid development, spermatogenesis. Then searched list of GO numbers for ones associated with reproduction:
reproduction = GO: 0000003
sexual reproduction = GO: 0019953
multicellular organism reproduction = GO:0032504
cellular processes involved in reproduction = GO: 0048610
developmental processes involved in reproduction = GO: 0003006
cellular processes involved in reproduction in a multicellular organism = GO:0022412
None of these terms were in the pinto contig list. The contigs that match reproduction-linked GO terms are #1478 (in utero embryonic development); 3423, 1478, and 3832 (meiosis); 3546, 7976 (spermatid development and spermatogenesis).
Also found contigs related to reproduction by searching for the words "sperm", "ova", "ovum", "egg", "fertilization", "vitellogenin" (not all of these terms returned results). Contigs found were 4476 (sperm flagellar protein), 6342 (motile sperm domain-containing protein), 7415 (sperm receptor for egg jelly), 181 (egg protein).
66 pinto abalone contigs matched the GO Slim term "stress response". These included genes that are homologous to genes in the oxidative stress and oxidative response pathway, ubiquitination, apoptosis, a variety of chaperones.
Did the same analysis for Oly. Contigs were found for the following GO reproduction-related terms: cell differentiation involved in embryonic placenta development, copulation, DNA methylation during gametogenesis, embryo implantation, embryonic development, embryonic development during birth or egg hatching, embryonic hemopoiesis, embryonic limb morphogenesis, femal pregnancy, fusion of sperm to egg plasma membrane, germ-line sex determination, in utero embryonic development, male gonad development, male meiosis, meiosis, meitotic spindle organization, oogenesis, ovarian cumulus expansion, ovarian follicle development, ovulation cycle, partruition, single fertilization, sperm motility, spermatid development, spermatogenesis, zymogen granule membrane.
July 31, 2012
Bioinformatics: Pinto Ab NGS
Did the enrichment analysis described yesterday for pinto abalone. Below are the REVIGO plots for 0 matches (first plot) and 6 matches across species (2nd plot).
Continuation of enrichment analysis. In Galaxy, joined the list of background SPIDs (generated for DAVID) with GO and GO Slim associations (files 49 and 51). DAVID generated lists of GO terms that were enriched for each matching category (0-6). In Galaxy, joined these GO terms with GO Slim terms (files 55, 62-67). Joined each file of enriched GO/GO slim terms with the background file based on GO term (files 68-74). In excel, created pivot tables for each category of GO enrichment based on GO Slim terms, i.e. each enrichment category is GO terms that are associated with contigs that matched no other species' contigs through contigs that matched all 6 other species. Removed other biological and other metabolic processes and made pie charts. Made a table to summarize all these results with Number of cross-species matches as the rows (0-6) and GO Slim categories as the columns.
Same enrichment analysis was done for H. kamtschatkana. File number for background SPIDs joined to GO and GO Slim are 76 & 77. Enrichment by cross-species matches joined to GO Slim are files 85-91. The above files joined together are files 92-98. Pie charts could not be made for 2 and 3 match enrichments because there were only 2 categories (they are represented in the table).
July 30, 2012
Bioinformatics: Pinto Ab NGS
Made box plots of evalues by GO Slim term for pinto and Oly. Only GO Slim terms that correspond to biological processes were included. Redundant contig-GO Slim pairings were removed. Log evalues were plotted in the box plots. Whiskers on box plot extend to most extreme values. For log evalues, all unknown numbers (#NUM!) were replaced by 0 (these correspond to evalues=0).
Enrichment analysis for contigs that match to 0-6 other species. Removed blast hits that did not make e-value cut-off (1e-5) and found how many matches each O. lurida or H. kamtschatkana contig had across species. In DAVID, uploaded all gene SPIDs as background and then did 7 different analyses with genes with 6, 5, 4, 3, 2, 1, or 0 matches across other species blast results. So far this analysis has been completed for O. lurida, not for H. kamtschatkana. Below are the REVIGO visualizations (with p-values from DAVID) for contigs that don't match to any other species (0 matches) and the second plot is contigs that match across all species comparisons.
July 27, 2012
Bioinformatics: Pinto Ab NGS
Steven did blast of O. lurida transcriptome against H. kamtschatkana. See
his notebook.
In Galaxy, added this new blastn to the Oly contig file joined with other blast results (Galaxy 155).
Original files used to join in Galaxy and do downstream analyses were the files that resulted from the blastx search against SwissProt. This means that some of the contigs are missing if they did not match up with a SPID. Need to redo all the joining in galaxy using the original list of contig names and redo necessary analyses downstream.
Started new Galaxy work flow called PNW Genomics. (1) Uploaded contig list to Galaxy (2) as well as file of SwissProt blast results (3) annotated to gene name. (4) Annotated to GO and (5) then to GO Slim. Took file from (3) and (6) joined with other species blast files. File names below:
H. kamtschatkana
(1) Haliotis kam contig list
(2) & (3) pinto SPID -> Galaxy 10
(4) Galaxy 11
(5) Galaxy 13
(6) Galaxy 22-27
O. lurida
(1) Ostrea lur contig list
(2) & (3) oly SPID -> Galaxy 37
(4) Galaxy 38
(5) Galaxy 39
(6) Galaxy 40-45
July 26, 2012
Bioinformatics: Pinto Ab NGS
Found Pearson correlation coefficients across bit scores for multi-species blast, for both Oly and Pinto. Did not limit bit scores by e-value, so all data were used for analyses. When there was no bit score, entered "NA". Used R for analysis and function cor.test. All pairwise correlations were positive and significant (p < 2.2e-16). Plotted correlation coefficients by species. Species are laid out on the x-axis and then for each species, the correlation coefficient for bit scores with another species is plotted and color coded. There is redundancy in these graphs since for the correlation with H. midae is plotted for C. gigas and the correlation for C. gigas is plotted for H. midae. The first plot shows the correlations for pinto and the second is for oly.
July 25, 2012
Secondary stress: RNA-Seq
Annotated sequences with barcodes (MPX and BC) so that I could see what was the real sequence. Barcodes are in gray, vector sequence is in red. Sequences that were good quality and had sequenced inserts were 13_1, 13_4, 271_1, 271_2, 274, 58_1, and 67_3. The inserted sequence, when blasted to Sigenae version 8, returns some fragmented hits to contigs. No hits are returned searching C. gigas ESTs or nucleotides in GenBank.
Spawning at Taylor hatchery
Sampled larvae from outbred crosses today. Filtered entire volume of soda bottles onto a 60 µm screen. Taking minimal amount of seawater, pipetted up larvae and put into 1 mL of 100% EtOH (2 mL screw cap tube). Checked a small aliquot of one of the samples to make sure that I was, in fact, sampling larvae. The larvae all looked good: D-hinge, no immediately apparent abnormalities.
July 24, 2012
Secondary stress: RNA-Seq
Sam did plasmid preps yesterday using the Qiagen kit.
I am doing sequening prep and sequencing at NWFSC. First did a Big Dye reaction to prepare for sequencing. Made a master mix of 2 ul 5x buffer, 0.32 ul M13 forward primer, 1 ul terminator mix (Big Dye), 2.68 ul water, and 4 ul template DNA from plasmids (total reaction volume = 10 ul). Plate layout is below. Thermalcycler profile: 96C 10s, 50C 5s, 60C 4 min (30 cycles).
|
1
|
2
|
3
|
A
|
271-2
|
67*-3
|
58*-2
|
B
|
67*-2
|
274*
|
|
C
|
13-4
|
NEG
|
|
D
|
67-1
|
13-3
|
|
E
|
13-2
|
61-2
|
|
F
|
271-1
|
61-1
|
|
G
|
280*
|
13-1
|
|
H
|
16
|
58*-1
|
|
Cleaned up Big Dye reaction using Agencourt CleanSEQ. Resuspended magnetic beads and aliquoted 10 ul of beads per well. Added 42 ul 85% EtOH and pipetted up and down 7 times to mix well. Placed on magnetic plate for a few minutes, then removed EtOH. Washed 2 more times with 100 ul of EtOH, letting sit about 30 s between each wash. Let dry completely at room temperature. Added 40 ul of 0.1 mM EDTA and let sit for a couple of minutes. Briefly spun down plate and loaded on ABI 3100.
July 23, 2012
Spawning at Taylor hatchery
Helped in spawning of inbred lines of C. gigas. Oysters were originally wild broodstock from Canada (pipestem) that were randomly selected to create inbred lines. Spawning was 1:1 within families (i.e. 1 male from family 1 was spawned with 1 female from family 1). We also made 10 outbred crosses to test the parentage assignment of our microsatellite panel (female x male): 16x6, 49x29, 75x15, 15x18, 70x58, 18x73, 6x49, 29x16, 4x70, and 58x4. For each cross, females were strip spawned and eggs were washed through an 80 ul to a 20 ul sieve. The eggs were then homogenized and counted on a coulter counter to create a beaker of 800 mL of 1.2 million eggs. Eggs were left to sit on the counter for at least 30 minutes so that they could "round" up. Males were strip spawned and sperm were left to activate for ~ 5 minutes before dilute sperm (~2 mL) was used to fertilize eggs. Fertilization was allowed to progress for 30 minutes before fertilized eggs were put into buckets. The outbred crosses were done in the morning and in the afternoon were transferred to 2 L soda bottles and transferred back to Seattle (only about 1/4 of the fertilized eggs were brought back, so ~300,000). For each adult spawned, I took a sample of adductor muscle for USC and a sample of mantle for Sea Grant genotyping efforts. Samples are stored in screw cap tubes in 95% EtOH. Sample labels are the following: 12x4 means that in 2012 they are used in the 4th spawn, male or female, 10x9 means that in 2010 they resulted from the 9th spawn, e.g. 1x1 means that they are used in the family 1 inbred cross.
Bottles of fertilized embryos were put in the basement. Sammi will aerate tomorrow morning.
July 22, 2012
Secondary stress: RNA-Seq
3:30 pm
From gridded plate (7/20/12) picked bacteria with toothpick and put in 3 mL of LB+Kan broth (7/20/12). Grew up at 37C, 200 rpm.
July 20, 2012
Secondary stress: RNA-Seq
Cloning
Not many colonies grew on the plates and there are very few white colonies. Will pick and sequence all white colonies that are there.
Made liquid culture broth (1XLB + Kanmyacin): 100 mL 5X LB lab stock, 400 mL nano pure water, 500 µl 50 mg/mL Kanmyacin.
Restreaked white colonies onto gridded plate. Warmed plate at 37°C before restreaking. Some of the colonies may have had a tiny bit of blue in them and are indicated with a "*". 2 blue colonies (1 from 274 and 1 from 280) were streaked as negative controls. Restreaked colonies are 274*, 4 from 13, 67, 67*, 2 from 58*, 2 from 61, 16, 2 from 271, 280*. Plate was put in incubator at 37°C.
Bioinformatics: Pinto Ab NGS
Made plots to demonstrate the Oly or Pinto contigs that are shared between transcriptomes across multiple species (based on BLAST results). These plots only include Oly/Pinto contigs if they were annotated with SPID at e-value of at least 1e-5 and blast hits only if they matched with the same e-value cut off. The y-axis is a list of contigs for the transcriptome being compared (Oly/Pinto) and then each dot on the graph represents a match between that original transcriptome and a gene in one of the other species.
July 19, 2012
Secondary stress: RNA-Seq
Cloning
Chose 8 samples at random to clone and Sanger sequence: 13, 16, 58, 61, 67, 271, 274, 280. Cloning was done using TOPO pCR2.1 kit following manufacturer's protocol. To 2 µl of PCR product from each of those samples (PCR done 7/17/12) added 0.5 µl TOPO salt solution and 0.5 µl TOPO vector. Incubated at 22°C for 10 minutes and put on ice. Thawed competent cells on ice (One Shot TOPO 10 cells). Added 2 µl of PCR/salt/vector mix to the competent cells (swirling while adding) and incubated for 10 minutes on ice. Heat shocked for 30s in a water bath at 42°C and put on ice for 2 minutes. Added 250 µl SOC (room temp) to each sample (under hood), rolling the tubes to coat sides. Incubated at 37°C, 200 rpm for 1 hour. Meanwhile, LB+Kan plates made yesterday were warmed for >30 minutes at 37°C, spread with 40 µl of 40 mg/mL XGAL, and kept in incubator until ready for use (with lids slightly cracked to evaporate some moisture). Sam spread the plates after the samples were done incubating: for each sample a plate was spread with 50 µl or with 200 µl of competent cell solution. Plates were returned to 37°C incubator.
Bioinformatics: Pinto Ab NGS
Redid GO Slim pie charts for Pinto and Oly so that GO Slim terms are non-redundant with respect to contig.
Started to work on graph depicting the contigs that match different species from blast searches. Only contains with an evalue of at least 1e-5 were used. A horizontal bar graph was made with individual O. lurida contigs on the y-axis and number of cross-species orthologs (correct term?) on the x-axis. The x-axis represents how many other species had contigs that matched the original O. lurida at the pre-determined e-value. The fewest contigs matched across all 5 species (C. gigs, P. fuctata, R. philippinarum, H. midae, D. rerio; n=574) and the most matched with just one other species (n=4,746) or 2 other species (n=4,700).
July 18, 2012
Secondary stress: RNA-Seq
It was difficult to access the liquid that the gel slices were sitting in, so I spun them down on Millipore gel extraction columns (5,000xg for 10 minutes). I started with a subset of 8 samples. Concentrations of the cDNA product post-gel extraction can be found in the
spreadsheet. Did a PCR using 5 ng of template for each sample. Each reaction consisted of 1 µl 2.5 mM dNTPs, 1 µl 10X PCR buffer, 0.2 µl 10 µM ILL-Lib1-20 oligo, 0.2 µl 10 µM ILL-Lib2 oligo, 1 µl Titanium taq, 6.6 µl template+H2O ( volume of template varied depending on concentration). Amplified on thermal cycler following: 95°C 5 minutes; 15 cycles of 95°C 40s, 63°C 1 minute, 72°C 1 minute. Ran 3 µl of product on 1% agarose gel with EtBr (made with 1x modified TAE) for 5 minutes with TAE level below top of gel and then for 30 additional minutes with buffer covering gel (using Hyperladder II).
The smear is in the ~250-300 bp range (the tail above the smear is an artifact from dry loading the gel).
Prepped the other 24 samples the same way, but did not spin them in the millipore columns. Dropped gel band from 271 on the ground. Rinsed it with nano pure water and put back in tube. After dry loading the gel, ran for 2 minutes at 100V and then covered with TAE to run an additional 40 minutes. 300 bp band is circled in blue. Sample 271 has a faint band at a larger size than 250-300 bp (circled in white).
Cloning
Made LB for plates. Mixed 100 mL of 5X LB lab stock with 400 mL nano pure water and 7.5 g Bacto agar. Swirled to mix, covered with foil, and put in autoclave at 121°C for 20 minutes (sterilization only). After flask cooled enough to touch, added 500 µl of 50 mg/µl kanamycin. Filled plates on a sterilized lab bench, let solidify, and then placed at 4°C.
July 17, 2012
Secondary stress: RNA-Seq
Re-did PCR from yesterday to test barcode PCR, but with a couple of changes. Only ran 8 samples per reaction (samples 1-22 for A, 49-70 for B, 217-238 for C, and 264-286 for D). Also only used 5 ng of cDNA per reaction by making a dilution to 2.5 ng/µl and using 2 µl as template. The product in D is definitely brighter than the product in C, but there's still too much showing up in C.
Re-did PCR with only 1.2 µl (3 ng) of template (added 0.8 µl of water to make up the difference in volume).
Did big PCR with 15 ng (6 µl of 2.5 ng/µl dilution of cDNA) of template, 31 µl H2O, 5 µl 2.5 mM dNTP, 5 µl 10X PCR buffer, 1 µl titanium taw. Added 1 µl of each 10 µM barcode to corresponding wells (see spreadsheet 7/16/12). Amplified using same profile as 7/16/12.
Made 10X TBE: 108g Tris base, 55g boric acid, 40 mL 0.5 M EDTA (pH 8.0), and filled to 1 L with Nanopure water. Stirred for 30 minutes. Diluted to 1X with nano pure water and used to make and run gels.
Made 2% agarose gels with 1XTBE and 10 µl SYBR Safe DNA staining dye (invitrogen). Loaded entirety of PCR product into wells (dry loading) and used 10 µL of low molecular weight ladder pBR322 DNA-Msp1 Digest. Before loading, mixed 10 µl ladder with 40 µl nano pure water and 10 µl 6X loading dye. Ran gels at 100 V for 5 minutes so that product could leave well and then covered gels with 1X TBE and ran for an addition 65 minutes. Cut out bands between 250-300 bp and put bands for each individual in a labeled tube. Added 40 µl nano pure water and spun down at 10,000xg for 30 s. Stored at 4°C overnight. There were bands on the gel, but they were very light. The bands for 232-274 were much longer than the other bands and are not as completely submerged in the water.
July 16, 2012
Secondary stress: RNA-Seq
Purified the PCR products from reaction run 7/13/12 using a NucleoMag 96 PCR Cleanup Kit. Added 60 mL 100% EtOH to buffer MP3 before beginning protocol. Divided each 100 µl PCR reaction into 2 wells per reaction in NucleoMag U plate. Added 6 µl well-mixed magnetic beads to each well and 138 µl buffer MP1 (mixed thoroughly by pipetting). Placed on magnetic separator plate for 1 minute and removed supernatant. Washed beads with 200 µl MP2 and 200 µl MP3 in succession, placing on magnetic separator and removing supernatant between each step. For second wash with MP3, added just 100 µl of buffer to each well and then combined the previously split PCR products into one well each. Put on magnetic separator and removed supernatant. Let beads dry for 10 minutes. Added 25 µl MP4 elution buffer and mixed well; let incubate 5 minutes at room temp. Placed on magnetic separator for 1 minute and collected eluted DNA (supernatant) and put into a new well plate. Quantified DNA on Nanodrop. See
spreadsheet for concentrations.
Began adaptor extension and size selection. Diluted an aliquot of the amplified, cleaned up cDNA in Nanopure water so that the concentration was 5 ng/µl. Diluted barcode oligos to 1 µM and stored in a well plate in -20°C (see spreadsheet for barcode assignments). Prepared 4 master mixes to test PCR -
master mix A: 5.8 µl H2O, 1 µl 2.5 mM dNTP, 1 µl 10X PCR buffer, 0.2 µl Titanium Taq
B: 3.8 µl H2O, 1 µl 2.5 mM dNTP, 1 µl 10X PCR buffer, 0.2 µl Titanium Taq, 2 µl 1 µM TruSeq-Mpx oligo
C: 3.8 µl H2O, 1 µl 2.5 mM dNTP, 1 µl 10X PCR buffer, 0.2 µl Titanium Taq, 2 µl 1 µM TruSeq-BC oligo
D: 1.8 µl H2O, 1 µl 2.5 mM dNTP, 1 µl 10X PCR buffer, 0.2 µl Titanium Taq, 2 µl 1 µM TruSeq-Mpx oligo, 2 µl 1 µM TruSeq-BC oligo
For each master mix, the mix was aliquoted first, followed by the oligos, and then the cDNA. The reactions were amplified in a thermal cycler: 95°C 5 minutes; 4 cycles of 95°C 40s, 63°C 1 min, 72°C 1 min. Ran out 5 µl of product on a 1% agarose gel with EtBr.
Samples in reaction D amplified, as they were supposed to. There was no amplification in reaction A, however there was amplification in reactions B and C. Tomorrow I will try the same thing again but use half the cDNA template for the reaction.
July 13, 2012
Secondary Stress: RNA-Seq
Did a full-scale PCR of the cDNA made yesterday (7/12/12). For each reaction: 59 µl H2O, 10 µl 10X buffer, 10 µl 2.5 mM dNTPs, 2 µl 5ILL oligo, 2 µl 3ILL-20TV oligo, and 2 µl titanium taq. To the 85 µl of master mix in each well added 15 µl of well-mixed cDNA. PCR profile: 95°C for 5 minutes; 17 cycles of 95°C 40 s, 63°C 1 min, 72°C 1 min. Loaded 5 µl of each PCR'd sample onto a 1% agarose gel with EtBr (Hyperladder II). At the end of the gel put on 6 samples from the C PCR that did not make it on yesterday (samples have been sitting at room temp overnight). Dry loaded the gel and ran for 5 minutes at 100 V to move the samples out of the wells (checked with UV light to make sure they had moved). Added 1x TAE buffer to cover gel and resumed running for about 55 minutes. This method seemed to somewhat fix the streaking problem in the gels.
July 12, 2012
Secondary Stress: RNA-Seq
Yesterday (7/11/12), Sam reconstituted the oligo primers and made cDNA from the RNA I fragmented 7/3/12 (see his
notebook). This morning, I accidentally contaminated the plate of cDNA and so had to start over again with the RNA fragmentation. I repeated the steps taken 7/3/12. The gel below is loaded in the same order as the previous one and was run at 110 V for ~30 minutes. The RNA again fragmented into the correct size range (100-500 bp).
After fragmentation, the fragmented RNA was used for cDNA synthesis. To each well containing ~10 µL RNA, 1 µl of 10 µM 3ILL-20TV oligonucleotide was added and mixed by pipetting. The plate was incubated at 65°C for 3 minutes and then transferred to ice. Master mix was made with the following per reaction: 1 µl Nanopure H2O, 1 µl 10 mM dNTP, 2 µl 0.1 M DTT, 4 µL 5X buffer, 1 µL 10 MM 5ILL-SW oligonucleotide, 1 µl superscript II reverse transcriptase. 10 µl of the mix was added to each well, mixed by pipetting and incubated for 1 hour at 42°C followed by a 5 minute 65°C deactivation (thermalcycler protocol = SSRT). The cDNA was diluted 1:5 by adding 80µl nanopure water.
4 master mixes were prepped to test PCR of the cDNA. Volumes below are per reaction:
master mix A: 12.6 µL H2O, 2 µL 2.5 mM dNTP, 2 µl 10x PCR buffer, 0.4 µl Titanium Taq
B: 12.2 µL H2O, 2 µL 2.5 mM dNTP, 2 µl 10x PCR buffer, 0.4 µl 10 µM 5ILL oligo, 0.4 µl Titanium Taq
C: 12.2 µL H2O, 2 µL 2.5 mM dNTP, 2 µl 10x PCR buffer, 0.4 µl 10 µM 3ILL-20TV oligo, 0.4 µl Titanium Taq
D: 11.8 µL H2O, 2 µL 2.5 mM dNTP, 2 µl 10x PCR buffer, 0.4 µl 10 µM 5ILL oligo, 0.4 µl 10 µM 3ILL-20TV oligo, 0.4 µl Titanium Taq
Each cDNA sample was amplified with each of these master mixes (18 µL master mix, 2 µL cDNA). A-C are on one plate in a thermalcycler and D is on a separate plate in a different thermalcycler. Profile: 95°C 5 minutes; 17 cycles of 95°C 40s, 63°C 1 min, 72°C 1 min. After the 17 cycles, loaded 5 µl of product on 1% agarose gels with EtBr (Hyperladder II). Did not have enough wells to load the last 10 samples from master mix C. Ran gels at 110 V for 40 minutes. Gels were loaded by column (see plate layout 7/3/12) and only the master mix signifier (A-D) is indicated on the gel photo. There was amplification in the ~100-500 bp range for master mix D only. none of the other master mixes showed amplification, although the first row on the second gel (mostly master mix A) ran too far. however, the samples in the other rows show no amplification.
July 3, 2012
Secondary Stress: RNA-Seq
Took concentrations of second batch of samples using nanodrop, as described 7/2/12. Sample Exp2.271 had 2 peaks in its nanodrop spec, the first one was at <230.
Aliquoted 2 µg of the DNased RNA into a plate and added enough 10 mM Tris to bring the volume to 20 µL. Incubated at 95°C for 25 minutes. Ran 10 µL of each sample on a 1% agarose gel (100 V, 1 hour). Stored remaining volume in the plate (see map below, numbers correspond to sample numbers) on the top shelf of the -80°C.
col/row#
|
9
|
10
|
11
|
12
|
A
|
1
|
49
|
217
|
265
|
B
|
4
|
52
|
220
|
268
|
C
|
7
|
55
|
223
|
271
|
D
|
10
|
58
|
226
|
274
|
E
|
13
|
61
|
229
|
277
|
F
|
16
|
64
|
232
|
280
|
G
|
19
|
67
|
235
|
283
|
H
|
22
|
70
|
238
|
286
|
The majority of the fragmented RNA is in the 100-500 bp range.
July 2, 2012
Secondary stress: RNA-Seq
Made 1 M Tris with pH = 8.0 by dissolving 6 g of Tris base in ~400 mL of water, bringing to pH 8 with 10 N HCl, then adding water to bring to 500 mL (water used was 0.1% DEPC), adjusting again to pH 8. Then diluted 1 mL of of 1 M Tris in 99 mL of 0.1% DEPC H2O and adjusted pH to be 8.
DNased (rigorous protocol) all extracted samples. Did them in 2 batches, first DNased the samples extracted 6/19 then the ones extracted 6/21/12. Diluted 10 µg of RNA in enough DEPC H2O to equal 50 µL then added 5 µL 10X TURBO buffer and 0.5 µL DNase. Incubated at 37°C for 30 minutes, added 0.5 µL more DNase and incubated another 30 minutes. Measured concentration of the first set of samples using the nanodrop (in triplicate). See
spreadsheet for details.
Made a 1 % agarose gel with EtBr. Dry loaded 5 µl of each RNA sample onto gel and ran for ~25 minutes at 110 V to check quality of DNased RNA. Some of the samples look streaky because the sample probably got injected into the gel instead of in the well. The bright bands are rRNA. There is no sign of genomic contamination.
June 29, 2012
Secondary stress: RNA-Seq
Did the same protocol for fragmentation as yesterday with the following changes:
-fragmentation times were 10, 15, 20, and 25 minutes
-Did the same dilution of unfragmented RNA as was done for fragmented (1 µg in 10 µL DEPC H2O)
-dry loaded the gel
-did not load non-DNased RNA
-loaded all 10 µl on gel
25 minutes seems to be the ideal fragmentation time since the majority of the RNA is between 100 and 500 bp.
Bioinformatics: Pinto Ab NGS
The previous file used for interspecies comparison of contigs with pinto abalone was redundant (multiple entries for each contig due to multiple GO annotations). Redid joining other species blastn files in Galaxy with pinto abalone annotated with just SPIDs (this file is non-redundant, Galaxy 151). Pinto abalone contigs that annotate with SPIDs with an evalue cut-off of 1e-5 = 1,351. D. rerio contigs that match at the e-value cut-off to the contigs annotated with SPIDs at the cut-off are 625, 848 H. midae contigs, 797 C. gigas, 712 O. lurida, 557 P. fuctata, and 499 R. philippinarum. Redid Venn diagrams - numbers in black represent the contigs that uniquely match to that species' database, numbers in italic gray represent contigs that are represented by 2 or 3 databases. Also redid Venns for the Oly contigs so that numbers are now correct.
OLY CONTIG MATCHES
PINTO CONTIG MATCHES
June 28, 2012
Secondary stress: RNA-Seq
Tested incubation time for RNA fragmentation for 3' Tag-based RNA-Seq (Meyer protocol). Used 3 samples of C. gigas gill RNA from previous experiments: T.ISO3 (803.11 ng/µl), PLY3 (351.44 ng/µl), and gill from 10/20/10 (1230.85 ng/µl). Aliquoted 1 µg of RNA from each sample (1.25, 2.85, and 0.81 µL, respectively) into 10 µL of 0.1% DEPC H2O. Each sample was aliquoted into 4 separate PCR strip tube wells to incubate at 5, 10, 15, and 20 minutes at 95°C. After incubation, samples were put on ice until time to load all on gel. Also loaded on the 1% agarose gel with EtBr was 100 ng of the unfragmented DNased RNA (diluted in DEPC H2O) and 5 µL of a 1:10 dilution of the un-DNased RNA. 5 µL of each sample was mixed with 0.5 µL of 10x loading buffer. A HyperII ladder was used. For each time point, the samples were loaded in the following order: TISO3, PLY3, gill. The gel was run for about 1 hour at 100V.
The main part of the RNA smear that we are interested in is obscured by the dye front. Am going to do the same thing tomorrow, but dry load the gel without dye so that I can clearly see the RNA smear. I will not do the 5 minute incubation time tomorrow because the RNA is not fragmented enough. Also, more unfragmented RNA needs to be loaded as a control (the lines are very faint).
June 27, 2012
Bioinformatics: Pinto Ab NGS
Fixed the Oly blast to H. midae file and rejoined blast results in Galaxy (Galaxy 144). For comparative species analysis, only used Oly contigs that were annotated with SPID of at least 1e-5 and only used blastn results that matched with e-value of at least 1e-5. For the SPID cut-off, this is 15918 contigs. At this cut-off, 13,064 C. gigas contigs matched to Oly contigs, 9,078 P. fuctata contigs, 1,543 R. philippinarum contigs, 2,002 H. midae contigs, 6,081 D. rerio contigs.
Redid numbers for Venn diagrams so that the numbers for the overlap between data sets represent those contigs annotated by only those 2 datasets (i.e. for a D. rerio-C. gigas overlap the number of contigs would be those that are shared between those 2 databases and are not found in the 3rd database). Deleted previous Venn files.
Secondary stress: histology
Looked over the 8 histo slides with Carolyn (see 5/24/12). Took some pictures of slides as references for identifying anatomical features.
For quantifying metaplasia, calculate the proportion of tubules that are dilated. Normal tubules have a 4-point star-like shape in the middle, whereas dilated ones are more open with rounded centers (cuboidal metaplasia).
diapedesis: hemocytes move through epithelium. Especially look for in intestinal epithelium. Define range of metaplasia seen and develop ranking score.
Vacuolization: occurs within epithelium of digestive tubules or intestine. Sometimes vacuolization is a response to stress.
Sloughing of cellular material into digestive tubules.
Gill: look for normal structure (epithelium, amount of mucous cells, ciliary tuft structure, vacuoles), hemocyte influx
For the kinds of hist sections I have (which are inconsistent) it will be easiest to look at/quantify changes in the digestive gland, germinal follicles, and gills.
The parasite is from one of Mac's oysters.
June 25, 2012
Bioinformatics: Pinto Ab NGS
Made alignments and phylogenetic trees using Geneious tree builder as described for 6/22/12. None of the sequences have been trimmed to the exact same length for any of the trees. If a sequence was much shorter than the others in the alignment, it was excluded.
v-type proton ATPase
not included: P. fuctata, H. kamtschatkana
sequences reversed: R. philippinarum, H. midae
Transmembrane protein 85
reversed sequences: D. rerio, O. lurida, H. midae
O. lurida may be closer to D. rerio in the tree because it is the only sequence that overlaps with Danio
Translation initiation factor
reversed sequences: D. rerio, O. lurida, H. midae
the P. fuctata sequence is really too short to include, but the alignment did not work without it.
HSP90
not included: R. philippinarum, P. fuctata
reversed sequences: O. lurida
HSP83
Alignment did not work
HSP82
not included: H. kamtschatkana, P. fuctata
reversed sequences: O. lurida
HSP70
not included: H. kamtschatkana, H. midae
reversed sequences: R. philippinarum
Cathepsin L
reversed sequences: H. kamtschatkana
GABA receptor associated protein
reversed sequences: H. midae
Redoing alignments and trees including all sequences, regardless of length, and then trimming alignments so that all sequences in tree are the same length.
v-type proton ATPase
shortest sequence: H. kamtschatkana (221 nucleotides)
Transmembrane protein 85
excluded D. rerio from alignment because it does not overlap with the shortest sequence
shortest sequence: H. midae (230 nucleotides), but p. fuctata determines end of 3' end so the alignment is only 194 nucleotides.
Translation initiation factor
Shortest sequence: P. fuctata (171 nucleotides)
Cannot get alignment to work :(
HSP90
could not include R. philippinarum sequence in alignment
shortest sequence: P. fuctata (438 nucleotides), But H. kamtschatkana does not extend all the way to the 5' end so alignment is 273 nucleotides
HSP82
could not include P. fuctata in sequence alignment
shortest sequence: H. kamtschatkana (647 nucleotides) but alignment is only 244 nucleotides long
HSP70
shortest sequence: H. midae (244 nucleotides) but alignment is 179 nucleotides long
Heat shock 70 cognate
shortest sequence: R. philippinarum (885 nucleotides). Alignment is 622 nucleotides, except the P. fuctata sequence is only 524 nucleotides long (it had a gap).
GABA receptor associated protein
shortest sequence: R. philippinarum (568 nucleotides), alignment is 512 nucleotides long.
Cathepsin L
shortest sequence: P. fuctata (450 nucleotides), alignment is 289 nucleotides
June 22, 2012
Bioinformatics: Pinto Ab NGS
For the genes of interest for phylogenetics (see 6/21/12) imported the sequences that matched the same pinto ab contig in blastn searches (in file Galaxy 141). The 7 sequences correspond to pinto ab, olympia oyster, c. gigas, pearl oyster, manila clam, H. midae, and D. rerio. In the first alignment, the manila clam and O. lurida sequences did not align well, so I took the reverse complements of both and realigned for a better result. Alignments were made according to the following parameters: cost matrix 65% similarity (5.0/-4.0); gap open penalty 12; gap extension penalty 3; global alignment with free end gaps; automatically determine sequences' direction; 2 refinement iterations.
Made a phylogenetic tree in Geneious using PHYML plug-in: HKY85 substitution model, 100 bootstraps, transition/transversion ratio for DNA models fixed at 4, proportion of invariable sites fixed at 0, number of substitution rate categories 1, Gamma distribution parameter estimated, no optimization.
Made a phylogenetic tree using Geneious tree builder: genetic distance model HKY, tree build method neighbor-joining, D. rerio as outgroup, bootstrap 100 times, create consensus tree, support threshold 50%.
June 21, 2012
Secondary stress: Exp2
RNA extractions of 8 samples from experiment 2 - 103B (control) samples Exp2.265, 268, 271, 274, 277, 280, 283, 286. Followed same protocol as described 6/15/12. Sample Exp2.271 seems to be poor quality: the three 260/280 were 1.97, 1.99, and 1.96; the three 260/230 were 1.78, 0.58, and 1.74.
Sample
|
Tissue mass (g)
|
Avg ng/µL
|
Exp2.265
|
0.09
|
841.1
|
Exp2.268
|
0.10
|
790.8
|
Exp2.271
|
0.06
|
639.8
|
Exp2.274
|
0.08
|
637.1
|
Exp2.277
|
0.04
|
364.2
|
Exp2.280
|
0.09
|
744.4
|
Exp2.283
|
0.06
|
564.2
|
Exp2.286
|
0.05
|
405.2
|
Bioinformatics: Pinto Ab NGS
Need to choose genes to do phylogenetics. Created a list of contigs that are shared across all species data sets. Filtered according to e-value: only used contigs that were annotated by SPID at at least 1e-5 and that matched pinto abalone contigs with an e-value of at least 1e-5. This ended up being 116 contigs once redundancies were removed due to multiple GO term matches. Those highlighted in green are from non-eukaryotes or plants and those in pink text are potentially interesting stress genes. 13 of the entries match to organisms that were probably contaminating the RNA sample. Genes of interest for phylogenetics include: heat shock cognate 70, eukaryotic translation initiation factor, HSP 90, HSP 82, HSP 83, HSP 70, v-type proton ATPase, transmembrane protein 85 (implicated in apoptosis).
June 19, 2012
Secondary stress: Histo
Histo slides are in! I just looked at them briefly and they look ok - at least one of them the cross section of the oyster body did not turn out well. All slides have a cross section and a section of the adductor muscle.
Bioninformatics: Pinto Ab NGS
Realized that there was a
formatting error with the Oly and pinto files that were blasted against the H. midae contigs. Analysis for 6/18/12 for pinto abalone have been corrected, but need to redo that part of the analysis for Oly.
Secondary stress: Exp2
RNA extractions of 8 samples from experiment 2 - 103B (control) samples Exp2.217, 220, 223, 226, 229, 232, 235, 238. Followed same protocol as described 6/15/12.
Sample
|
Tissue mass (g)
|
Avg. ng/µL
|
Exp2.217
|
0.06
|
413.2
|
Exp2.220
|
0.06
|
394.4
|
Exp2.223
|
0.08
|
556
|
Exp2.226
|
0.09
|
640.7
|
Exp2.229
|
0.04
|
347.2
|
Exp2.232
|
0.09
|
678.9
|
Exp2.235
|
0.05
|
432.2
|
Exp2.238
|
0.04
|
334.1
|
June 18, 2012
Secondary stress: Exp 2
RNA extractions of 8 samples from experiment 2 - 101B (highest pCO2) samples Exp2.49, 52, 55, 58, 61, 64, 67, 80. Followed same protocol as described 6/15/12.
Sample
|
Tissue mass (g)
|
Avg. ng/µL
|
Exp2.49
|
0.15
|
697.8
|
Exp2.52
|
0.11
|
736.0
|
Exp2.55
|
0.06
|
404.9
|
Exp2.58
|
0.05
|
279.2
|
Exp2.61
|
0.04
|
294.1
|
Exp2.64
|
0.07
|
662.6
|
Exp2.67
|
0.09
|
574.5
|
Exp2.70
|
0.03
|
324.1
|
Bioinformatics: Pinto Ab NGS
Downloaded blastx results that SR did 6/15/12 (annotations of pinto ab contigs with swissprot IDs). Joined with Swissprot-GO association file and then with GO to GO Slim file (Galaxy 110). 34,959 contigs were annotated with SPIDs. Downloaded file and removed all redundant entries (7,389 contigs) and filtered data so that analyzed only those contigs that were annotated by SPID with an e-value of at least 1e-5 and corresponded to biological processes according to GO terms (1,358 contigs). After removing "other biological processes" and "other metabolic processes" there were 951 contigs left that met the above criteria. Made a pivot table of these GO Slim terms - see pie chart below.
Uploaded pinto contig file annotated with SPIDs and GO terms into Galaxy. Also uploaded blast results from blasting pinto contigs against species specific databases C. gigas, H. midae, O. lurida, P. functata, Manila clam, and D. rerio. Joined the files in the order listed by matching H. kam contig numbers. This final file is Galaxy 130. Deleted all entries with SPID annotation of pinto contig greater than 1e-5. 767 C. gigas contigs correspond to Pinto contigs with a blastn e-value of at least 1e-5, 945 H. midae contigs, 717 O. lurida contigs, 564 pearl oyster contigs, 493 Manila clam contigs, and 614 Danio contigs.
June 15, 2012
Secondary stress: Exp 2
Did RNA extractions for 8 samples from experiment 2 (started 1/14/12 and sampled 2/11/12). Will begin sequencing effort with highest pCO2 samples (101B), tubes numbered Exp2.1, 4, 7, 10, 13, 16, 19, 22, 49, 52, 55, 58, 61, 64, 67, 70 and control (103B), tubes numbered Exp2.217, 220, 223, 226, 229, 232, 235, 238, 265, 268, 271, 274, 277, 280, 283, 286. All of these tubes are anterior gill samples. Today extracted RNA from Exp2, 1, 4, 7, 10, 13, 16, 19 and 22. Used Tri Reagent and followed manufacturer's protocol. Weighed the tissues before sampling. Only one (Exp2.13) was > 0.1 g, so I cut it in half and returned the remaining half to the -80C. All extractions had large pellets and to dissolve them I added 200 µl 0.1% DEPC H2O to the dried pellet and heated at 55°C for ~5 minutes. Pipetted multiple times to homogenize and measured concentration on the Nanodrop, 3 times for each sample (2 µL each time). Stored samples in gray plastic box in -80, labeled FHL OA: Secondary Stress Exp 2 RNA Box 1.
Sample
|
Tissue mass (g)
|
Avg. ng/µl
|
Exp2.1
|
0.09
|
721.1
|
Exp2.4
|
0.08
|
576.6
|
Exp2.7
|
0.12
|
822.5
|
Exp2.10
|
0.11
|
772.2
|
Exp2.13
|
0.20/2
|
857.7
|
Exp2.16
|
0.08
|
602.4
|
Exp2.19
|
0.03
|
240.5
|
Exp2.22
|
0.09
|
654.3
|
Mukilteo water chemistry
Made new dye: 0.032 g m-cresol purple in 40 mL nanopure water. Added 5 µL 5N NaOH, which was a little too much (add 4 µL next time). Corrected with 1.5 µL HCl (conc. unknown) for A1/A2 of 1.71.
Spec pH of 9 samples: source water from GHA, GHB, and Lab as well as 2 tanks from each. Did double dye addition for 3 samples to contribute towards dye correction that will be done next week.
Bioinformatics: Pinto Ab NGS
SR showed me how to do blastx on the cluster (he started one for pinto ab transcriptome). Once logged on to node 2, ncbi blast program is in the main directory, so hit "ls" to get exact name. cd to ncbi blastx and then cd to bin (this brings you inside the directory). Run code as shown in
SR's lab notebook for 6/15/12.
June 14, 2012
Bioinformatics: Pinto Ab NGS
Downloaded SR's annotation of Oly contigs with swissprot ID and GO Slim terms (downloaded data set that has already been filtered to include e-values of at least 1e-5). This file includes 77,384 entries. Filtered so that it's just biological processes and then filtered so that only unique entries remained (19,191 entries). Made a pivot table of the remaining GO Slim terms. Removed "other metabolic processes" and "other biological processes" and 14,660 annotation remained. Made a pie chart.
Downloaded Oly SPID annotated file that has not been filtered for e-values less than 1e-5, edited so just contained contig name, SPID, and e-value and uploaded to Galaxy.
Also uploaded Oly contigs blasted to species-specific databases: C. gigas, pearl oyster, Manila clam, Haliotis midae, and zebrafish.
Joined datasets based on Oly contig number in the order listed above. Every new blast result dataset is joined with the original Oly dataset (always keep original entry even if it doesn't match). File for Oly joined to C. gigas is Galaxy 103, with added pearl oyster is Galaxy 104, with added ruphi base is galaxy 105, with added H. midae is galaxy 106, and with added zebrafish is galaxy 107.
Within excel, filtered out blast hits so that only those of at least 1e-5 remain. For Oly contigs matching SPIDs, this is 15,918 entries. For Oly contigs annotated by C. gigas contigs = 13,064 entries, annotated by pearl oyster = 9,078, Manila clam = 1, 543, H. midae = 1,976, and Danio = 6, 081.
Made Venn diagrams of gigas vs. pearl vs. clam annotations of Oly contigs; of gigas vs. pearl vs. H. midae; and of gigas vs. pearl vs. Danio. For each diagram the order that the taxa are listed in correspond to a, b, and c (pink, green, blue). Below is the Venn for the gigas vs. pearl vs. Danio.
June 13, 2012
Bioinformatics: Pinto Ab NGS
SR did more blasts of pinto ab against the oly NGS data, zebrafish, rufibase, pearl oyster, C. gigas, and H. midae. See his
notebook Pinto Ab - Blast 6/13/12.
Oly assembly
here
Pinto assembly
here
Oly swiss prot and GO annotations
here
On wetgenes did blastx of Pinto assembly against swissprot database.
Secondary stress
Analyzed chemistry data for the 1 month exposure (1/14 through 2/11/12). Calculated average and SD values for salinity, TA, pCO2, pH, saturation states of calcite and aragonite, and concentration of CO3.
Treatment
|
salinity
|
TA
|
pCO2
|
pH
|
calcite
|
aragonite
|
carbonate
|
101B avg
|
29.9
|
2085.6
|
2848
|
7.25
|
0.54
|
0.34
|
22.1
|
101B SD
|
0.22
|
14.9
|
873
|
0.13
|
0.15
|
0.10
|
6.6
|
102B avg
|
29.9
|
2083.1
|
648
|
7.85
|
1.96
|
1.24
|
79.6
|
102B SD
|
0.30
|
16.9
|
51
|
0.031
|
0.11
|
0.068
|
4.3
|
103A avg
|
29.8
|
2085.7
|
1182
|
7.60
|
1.16
|
0.73
|
47.0
|
103A SD
|
0.26
|
13.8
|
118
|
0.041
|
0.10
|
0.065
|
4.2
|
103B avg
|
29.9
|
2085.4
|
427
|
8.01
|
2.70
|
1.71
|
109.9
|
103B SD
|
0.2
|
15.9
|
33
|
0.029
|
0.15
|
0.091
|
5.8
|
104A avg
|
29.9
|
1086.4
|
810
|
7.75
|
1.605
|
1.01
|
65.3
|
104A SD
|
0.2
|
12.1
|
61
|
0.030
|
0.106
|
0.067
|
4.3
|
104B avg
|
29.9
|
2084.9
|
991
|
7.67
|
1.34
|
0.85
|
54.7
|
104B SD
|
0.3
|
14.3
|
10
|
0.006
|
0.029
|
0.018
|
1.2
|
Below is a graph comparing the pH profile (from the durafet probes) for 103B (ambient, blue) and 101B (most elevated, red).
June 12, 2012
Bioinformatics: Pinto Ab NGS
Downloaded assembled transcripts from Patiria miniata off of Baylor's
urchin genome project website.
SR trimmed and assembled Pinto ab sequences to match parameters used for Oly larval assembly. Blasted Oly assembly against databases of sequence from pearl oyster, C. gigas, manila clam, and Haliotis midae (see results
here).
If another gastropod transcriptome is needed,
here is a snail one.
June 1, 2012
Bioinformatics: Pinto Ab NGS
Took blastall file from 4/30/12 (de novo 7 blasted against swissprot) and put only SPIDs in the second column. Uploaded to galaxy (blastall DN7-SPID) and joined with the swissprot associations file and then with GO to GO slim terms (Galaxy 93). Removed redundant entries and filtered by e-value to retain only non-redundant (39,697 contigs) that were annotated in SwissProt at the cut-off (6,759 contigs). From these remaining contigs, made a pivot table of the GO Slim terms corresponding to biological processes (1,532 contigs) and the pie chart below (1,101 contigs fit all these criteria and were annotated by GO Slim)
May 24, 2012
Secondary Stress: Histology
Sent off preliminary samples for histological prep. Samples are from 2/11/12. Picked 2 samples each from control (400 µatm), control + mechanical stress, 1400 µatm, 1400 µatm + mechanical. The sample numbers are:
control: H2.90, H2.95
control + mech: H2.82, H2.86
1400: H2.6, H2.19
1400 + mech: H2.10, H2.16
May 11, 2012
Bioinformatics: Pinto Ab NGS
Joined haliotis_evalue file (contigs annotated by haliotis db with e-value cutoff of 1e-5) with the crassostrea_evalue file and then with the strongylocentrotus_evalue file (crassostrea and strongylocentrotus were added based on the haliotis accession number). Joined file = Galaxy 89. Also joined crassostrea and strongylocentrotus (Galaxy 90). There's an overlap of 488 contigs between haliotis and crassostrea, 339 between haliotis and strongylocentrotus, 290 between all 3. There is a 314 contig overlap between crassostrea and strongylocentrotus annotations. For the input file to make a venn diagram, a = haliotis, b = crassostrea, c = strongylocentrotus.
In the Venn diagram, the black numbers are the total number of contigs annotated by the individual databases. The gray italic numbers are the number of contigs that are annotated by 2 or more databases.
May 8, 2012
Bioinformatics: Pinto Ab NGS
Did local blast in clc of de novo 7 consensus sequences against H. sapiens ref seq db, S. purpuratus all ESTs db, and C. gigas ESTs from Sigenae db. Parameters used are described 4/24/12.
Exported the MultiBLAST results for each db blast from CLC. For each result file, only the blast results corresponding to the lowest e-value are used. Results were filtered by e-value and downstream analyses only make use of contigs that blasted to the db with an e-value of at least 1e-5 (see table below).
Uploaded files of contigs resulting from the D. rerio and H. sapiens multiblasts into Galaxy (according to described specifications). Joined the tables by pinto abalone contig number to see if there is overlap between the annotations (Galaxy 79). There was an overlap of 239 contigs between the 2 blast results. Then did the same for the Haliotis multiblast results and compared the contigs annotated to the H. sapiens file and the D. rerio file. Compared all 3 by joining haliotis with h. sapiens and then joining with d. rerio. Haliotis db annotated 292 contigs in common with the H. sapiens db and 321 in common with D. rerio. There was a 228 contig overlap among all 3 databases. In the venn diagram haliotis annotations = a, homo sapiens = b, and danio rerio = c.
Database
|
Totalannotations
|
withevaluecutoff
|
H. sapiens
|
7204
|
332
|
D. rerio
|
8236
|
349
|
S. purpuratus
|
7697
|
380
|
C. gigas
|
8742
|
594
|
Haliotis
|
8857
|
3519
|
May 7, 2012
Mukilteo water chemistry
Did spec pH for source water for lab and GHB. Sammi had made the dye 5/5/12. Will do dye correction tomorrow.
May 3, 2012
Ceramide: vibrio gene expression
Redid t-tests in R comparing expression in control and treatment. ACMase is still differentially expressed (p<0.05), but 3KDSR is also differentially expressed. Using a boxcox plot to check for skewness, ACMase and CgT needed to be transformed (reciprocal transformation and squareroot, respectively). T-tests after data transformations resulted in the same relationship between control and exposed expression: significantly different for ACMase and not different for CgT.
Bioinformatics: Pinto Ab NGS
I helped Miranda and Selina design primers for genes involved in the immune response and calcification in the pinto abalone. We found genes homologous to ferritin, perlinhibin, proteasome subunit alpha, and sodium bicarbonate transporter in the NGS data and designed primers for all of them. I'm going to design EF1a primers for them. I downloaded Haliotis EF1a mRNA from NCBI: H. rufescencs (DQ087488), H. tuberculata (FN566842), H. diversicolor (EF553516), and H. diversicolor (AY953390). Imported sequenced to geneious and assembled (all 4 assembled together). Made a consensus sequence from the assembly. Imported consensus sequences from de novo 7 assembly into the "Haliotis" folder. The best hit was contig 7244, which covered the entire 1610 of the consensus sequence from the assembled multi-haliotis EF1a. The designed primers overlap with the area of contig 7244 that aligns with the haliotis consensus EF1a.
May 2, 2012
Bioinformatics: Pinto Ab NGS
Joined MultiBLAST tables with SPIDS with GO and GO slim terms in Galaxy for de novo 7 and SR assemblies of pinto ab data. Exported these tables and made pivot tables of the GOSlim terms for just biological processes (have not tried to remove redundancy or filter for e-value, except for the original MultiBLAST in clc with a cutoff of e=1e-5). The SR assembly had 4077 contigs annotated to GO Slim and 2913 once other biological and metabolic processes were removed. 12 Go Slim categories are represented in the data.
The de novo 7 assembly had 4366 contigs, 3152 without other biological and metabolic processes. 12 GO Slim categories are represented. The de novo 7 assembly had more contigs in all categories (compared to SR's assembly) except for cell-cell signaling and death (see table below).
|
denovo7
|
SR
|
celladhesion
|
90
|
80
|
cell cycle and proliferation
|
110
|
104
|
cell organizationandbiogenesis
|
240
|
232
|
cell-cell signaling
|
2
|
2
|
death
|
45
|
52
|
developmental processes
|
250
|
199
|
DNA metabolism
|
144
|
126
|
protein metabolism
|
254
|
246
|
RNA metabolism
|
776
|
740
|
signal transduction
|
299
|
215
|
stress response
|
213
|
210
|
transport
|
729
|
707
|
April 30, 2012
Bioinformatics: Pinto Ab NGS
Downloaded blastall results from wetgenes. Uploaded blastall files and MultiBLAST files (only e-value less than or equal to 1e-5 for the latter) into Glalaxy. Joined blastall results with MultiBLAST results for de novo 7 and SR assembly, as described 4/16/12.
For de novo 7: 8281 contigs from the assembly were annotated with SPIDs in blastall. 561 of the contigs from the significant MultiBLAST file did not match to contigs annotated by SPIDs in blastall.
For SR assembly: 7712 contigs from the assembly were annotated with SPIDs in blastall. 469 of the contigs from the significant MultiBLAST did not match to contigs annotated by SPIDs in blastall.
April 27, 2012
Bioinformatics: Pinto Ab NGS
Results from blast of SR's assembly to Haliotis db (4/24/12): 8291 sequences matched to the db.
To compare performance of the 2 assemblies (SR's and de novo 7), exported consensus sequences and uploaded into wetgenes to do blastalls (blastx) against swissprot. The results will be uploaded into Galaxy and joined with the contig names that correspond to a blast hit (against the Haliotis db) with an evalue of at least 1e-5. This will show the breadth of coverage achieved by both assemblies. Will also compare the number of contigs returned by each one (total and corresponding to an evalue of at least 1e-5).
Total contigs = contigs assembled in de novo assembly
Contigs @ 1e-5 = Contigs that matched to ESTs in Haliotis db with an e-value of at least 1e-5
#Go Slim Categories = the number of unique categories that correspond to contigs with an e-value of at least 1e-5
Assembly
|
TotalContigs
|
Contigs@1e-5
|
#GoSlimCategories
|
ETS de novo 7
|
8857
|
3519
|
see 5/2
|
SR
|
8291
|
3244
|
see 5/2
|
April 25, 2012
Bioinformatics: Pinto Ab NGS
The blasts didn't work from yesterday, so redid them. Most contigs (n=9,294) matched to contigs in the Haliotis and D. rerio databases (n= 8857 and 8236, respectively). Exported blast results.
Blastn with same parameters described 4/24/12 of pinto ab consensus seqs from de novo 7 against databases of H. sapiens RefSeq and C. gigas Sigenae contigs and S. purpuratus contigs from NCBI. (these failed, will try again later)
Downloaded SR's assembly (see his notebook, Pinto Ab data analysis for file): Abalone_pinto_v5beta_8673.fa. This assembly assembled >75 million reads to form 8673 contigs. did a local blast against the all Haliotis db. Blast parameters were the same as described 4/24/12.
April 24, 2012
Bioinformatics: Pinto Ab NGS
For de novo assembly 7, ~17.5 million reads matched to make 9,294 contigs. This is one of the best results yet (except for de novo 3, but the min contig length was only 100 bases for that one).
Downloaded SR's file of assembled all Haliotis ESTs from GenBank (file = Haliotis_comboNCBI_cdhit selection.fa) and uploaded as a database for local blast searches to clc.
Extracted consensus sequences from de novo assembly 7. Did a blastn against the Haliotis database: low complexity, expect 1, word size 11, no. of processors 2, match 1, mismatch -3, gap cost open 5 and extension 2, create overview blast table. Did the same to start a blast search against Danio rerio RefSeq database.
April 23, 2012
Bioinformatics: Pinto Ab NGS
For de novo assembly 6, ~17 million reads were assembled to make 9,296 contigs.
De novo assembly 7: mismatch cost = 1, limit = 7, no fast ungapped alignment, insertion cost = 3, deletion cost = 3, vote for conflict resolution, ignore non-specific matches, min contig length = 200, map reads back to contigs.
April 20, 2012
Bioinformatics: Pinto Ab NGS
Made local blast db on galaxy of all H. sapiens ESTs downloaded yesterday.
Summary of results for de novo assemblies 3 and 4 (done 4/19/12):
|
Reads
|
Matched
|
Notmatched
|
Contigs
|
denovo3
|
94,777,799
|
25,459,866
|
69,317,933
|
26,942
|
denovo4
|
94,777,799
|
13,541,271
|
81,236,528
|
9,294
|
Started new de novo (de novo 5) with mismatch cost = 2, limit = 8, no fast ungapped alignment, insertion cost = 3, deletion cost = 3, vote for conflict resolution, ignore non-specific matches, min contig length = 200, map reads back to contigs. Only ~13 million reads assembled, but it looks like the limit was set at 1, not 8. Started a new de novo (de novo 6) with the parameters that should have been used for de novo 5.
Downloaded Haliotis midae transcriptome (referenced
here ) as well as transcriptomic data for larval snail (
paper and
data on NCBI). Possible D. rerio transcriptomic data found in
this paper and
here on NCBI. For information on assembled sea star transcriptome, see
here.
Downloaded H. sapiens and D. rerio rna.fna files from the
RefSeq server.
Made local BLAST databases of the H. sapiens and D. rerio RefSeq data.
SR downloaded all Haliotis ESTs and mRNA and is making a non-redundant db.
April 19, 2012
Bioinformatics: Pinto Ab NGS
Previous download of H. sapiens did not work so reinitiated download of all H. sapiens ESTs from GenBank.
New de novo assemblies of trimmed pinto reads since the de novo previously used assembled relatively few reads and we'd like to capture more of the data. For de novo 3: mismatch cost=1, limit=6, fast ungapped alignment, conflict resolution by voting, ignore non-specific matches, min contig length = 100, map reads back to contigs and create a summary report. For de novo assembly 4, used all the same parameters as 3 except mismatch cost = 1, limit = 1, and min contig length = 200.
SR previously did an assembly of this data and got better results using the following parameters:
April 15, 2012
Bioinformatics: Pinto Ab NGS
Uploaded best e-value contig numbers to Galaxy for annotations by C. gigas and S. purpuratus datasets. Joined files to find overlaps of annotations. The light blue represents contigs annotated by C. gigas and dark green are those annotated by S. purpuratus (gray is overlap).
Began download of H. sapiens ESTs from GenBank.
April 14, 2012
Mukilteo water chem
Repeated set up of standards as described 4/13 and 4/14. Ran remaining Muk water samples for DIC and TA.
April 14, 2012
Mukilteo water chem
The computer had logged the pCO2 of the bubbled standards overnight and there was good separation between all 3: low, mid, high. Flushed the syringe twice and emptied tube before starting the first sample (started with mid). Ran all 3 samples 3-4 times, recording the temp and salinity and then ran CRM 116. Turned on all the TA equipment and water bath for 1/2 hour to warm up. Ran all 2 junks and all 3 refs on TA and calculated DIC from known pCO2 and TA. Used the reference average total sums to calculate the total sum-pCO2 relationship.
Ran 24 samples for DIC and 23 for TA.
April 13, 2012
Mukillteo water chem
When I arrived at FHL, Cory had been using the DIC and TA instruments all day so they were calibrated and warmed up. I ran 3 samples on both instruments (DIC first, then TA): Muk lab #10 4/10/12 5:40 pm, Muk lab #10 3/26 12:57 pm, Muk GHA in 3/26/12 1:00 pm. For the first sample on DIC, there were really large peaks pre-injection of sample. I figured out that the syringe needs to be flushed 2 times (and tube emptied) between each injection (as well as between samples, i.e. inter- and intrasample). For each sample, DIC is measured 3 or 4 independent times, with 2 syringe flushes in between each measurement. Cory had some CRM 116 left over from earlier in the day so I ran a CRM a the beginning and end of my 3 experimental TA samples.
In the evening (~7:30 pm), I zero'd the reference on the Licor and set up the standards for tomorrow. The standards are 3 seawater samples of 1800 mL each bubbled with a known concentration of pCO2 overnight. These are run first thing on the DIC, followed by a CRM, to make a standard curve.
April 12, 2012
Bioinformatics: Pinto Ab NGS
blast in clc done yesterday didn't work. Downloaded Sigenae v8 ESTs from crassostreome and uploaded into clc to make a local db for blasting (saved in folder trimmed de novo blasts). Imported external file to make db, sequence type nucleotide. Did a local blast: selected consensus sequences from de novo assembly of both pinto ab libraries, chose blastn, used sigenae v8 as the db, low complexity, expect 1, word size 11, no of processors 2, match/mismatch 1 and -3, gap cost open 5 and gap cost extension 2, create overview blast table and create one blast result per query.
Downloaded all mRNA sequences for Strongylocentrotus purpuratus and repeated the above steps to do a local blast against a purple sea urchin database (except did not create one blast results per query).
Downloaded all mRNA sequences for Danio rerio from NCBI and did a local blast on clc.
For analyses of the blast results in Galaxy, only the blast hit with the lowest e-value will be used (as opposed to greatest identity %, greatest positive %, or greatest hit length). Took accession numbers from best e-value hit for C. gigas blast results and put them into blastall on the INquiry portal: blastx, swissprot, 1 short description, 1 alignment, tabular output. Did the same for S. purpuratus.
April 11, 2012
Bioinformatics: Pinto Ab NGS
From assembly done yesterday, 5476 references resulted. Extracted consensus sequence from these refs and exported FASTA file. Imported FASTA into blastall app on wet genes: blastx against swissprot, tabular output, 1 short description, 1 alignment.
Followed workflow for H.asi backbone described 2/16/12. Exported Galaxy file is called Galaxy 51 - Hasi assembly of trimmed reads. The pie chart is of GO Slim terms from contigs that were annotated with GO biological processes and meet the e-value cutoff of 1e-5.
NCBI blast using clc of de novo assembly of trimmed pinto ab reads. Blastx against swissprot. Limit entrez query by Homo sapiens, low complexity, expect 10, word size 3, BLOSUM62 matrix, gap cost existence: 11 extension: 1, create one blast result per query. Saved in folder "trimmed de novo blasts".
April 10, 2012
Bioinformatics: Pinto Ab NGS
Began workflow of using trimmed reads (trimmed by SR, see 2/15/12) to assemble to reference H. asinina backbone. In clc, mapped trimmed reads from both libraries - air and high co2 - back to Hasi backbone (Hasi ESTs NCBI 013112). Mismatch cost = 2, limit = 8, fast ungapped alignment, add conflict annotations, vote resolution, random non-specific matches, create summary report and list of unmapped reads. The assembled sequences will be in the folder, "trimmed reads mapped".